Inter-Phone and Inter-Word Distances for Confusability Prediction in Speech Recognition

نویسندگان

  • Jan Anguita
  • Javier Hernando
چکیده

In this work we investigate new inter-phone and inter-word distances and we apply them to predict if two words of the lexicon of an Automatic Speech Recognition (ASR) system are likely to be confused. The inter-word distance is calculated from an alignment between the phonetic transcriptions of the words by adding the distances between the aligned phones. We bring a new solution in which the inter-phone distance used for computing the inter-word distance is not the same used to compute the phonetic alignment. The first one is calculated between the acoustic models of the phones with a new formula that we propose. The second one is based on phonetic knowledge. We also use two different kinds of alignments: either with or without insertions and deletions. In order to evaluate the performances, we introduce a classical false acceptance/false rejection framework and the prediction Equal Error Rate (EER) was measured to be less than 2%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word confusability prediction in automatic speech recognition

A new method to predict if two words are likely to be confused by an Automatic Speech Recognition (ASR) system is presented in this paper. A new inter-word dissimilarity measure based on Dynamic Time Warping (DTW) is used to classify the word pairs as confusable or not confusable. Firstly, the phonetic transcriptions of the two words to compare are aligned using only phonetic information. After...

متن کامل

Word confusability - measuring hidden Markov model similarity

We address the problem of word confusability in speech recognition by measuring the similarity between Hidden Markov Models (HMMs) using a number of recently developed techniques. The focus is on defining a word confusability that is accurate, in the sense of predicting artificial speech recognition errors, and computationally efficient when applied to speech recognition applications. It is sho...

متن کامل

Envelope-based inter-aural time difference localization training to improve speech-in-noise perception in the elderly

Background: Many elderly individuals complain of difficulty in understanding speech in noise despite having normal hearing thresholds. According to previous studies, auditory training leads to improvement in speech-in-noise perception, but these studies did not consider the etiology, so their results cannot be generalized. The present study aimed at investigating the effectiveness of envelope-b...

متن کامل

Modelling pronunciation variations in spontaneous Mandarin speech

Pronunciation in spontaneous Mandarin speech tends to be much more variable than in read speech. In current recognition systems, pronunciation dictionaries usually only contain one standard pronunciation for each word, so that the amount of variability that can be modelled is very limited. Most recent research work for modelling variations in spontaneous speech focuses on the lexicon level, whi...

متن کامل

Intra-speaker variation and units in human speech perception and ASR

Research on speech perception and ASR has resulted several important advances in our understanding of speech variation: one is that speaker dependent variation is systematic, another is that inter-speaker and intra-speaker variation diverge in their root causes and characteristics. Therefore, a successful approach to one may not always transfer to the other. Intertalker variation, or indexical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2004